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Soybean [Glycine max (L) Merrill] is one of the most important leguminous crops and ranks fourth after to 
rice, wheat and maize in terms of world crop production. Soybean contains abundant protein and oil, which 
makes it a major source of nutritious food, livestock feed and industrial products. In Japan, soybean is also 
an important source of traditional staples such as tofu, natto, miso and soy sauce. The soybean genome was 
determined in 2010. With its enormous size, physical mapping and genome sequencing are the most effective 
approaches towards understanding the structure and function of the soybean genome. We constructed bacte- 
rial artificial chromosome (BAC) libraries from the Japanese soybean cultivar, Enrei. The end-sequences of 
approximately 100,000 BAC clones were analyzed and used for construction of a BAC-based physical map 
of the genome. BLAST analysis between Enrei BAC-end sequences and the Williams82 genome was carried 
out to increase the saturation of the map. This physical map will be used to characterize the genome structure 
of Japanese soybean cultivars, to develop methods for the isolation of agronomically important genes and to 
facilitate comparative soybean genome research. The current status of physical mapping of the soybean ge- 
nome and construction of database are presented. 
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Introduction 

In 2010, the soybean genome was sequenced and assembled 
by the Soybean Genome Sequencing Consortium in the USA 
(Schmutz et al. 2010). The genome data are available via 
databases, phytozome (http://www.phytozome.net/soybean) 
and Soybase (Grant et al. 2010) (http://soybase.org/). Other 
soybean genomes were sequenced by a next generation se- 
quencer (Kim et al. 20 1 0, Lam et al. 2010). Soybase is an es- 
sential site and tool for soybean researchers to investigate 
genetics, molecular biology, breeding and genomics. Al- 
though this database is important for soybean research, 
Williams82 genome data are insufficient for Japanese soy- 
bean research. We therefore constructed a genome database 
from the Japanese cultivar Enrei, a common cultivar in 
Japan. Enrei was selected to construct the physical map and 
decode the genome sequence. 
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BAC library construction 

BAC libraries were constructed from nuclear DNA prepared 
from young leaves of Enrei (Baba et al. 2000). Two restric- 
tion endonucleases, Hindlll and Mbol, were used for par- 
tial digestion of DNA. Partially digested and size-selected 
DNA (100-180 kb) was ligated into the BAC vector, 
pIndigoBAC5 (Epicentre Biotechnologies), then trans- 
formed into E. coll, ElectroMAX DH10B cells (Life Tech- 
nologies). We picked up 80,000 clones of Hindlll digest, 
and 100,000 clones of Mbol digest, and designated GMJENa 
as the Hindlll digest library and GMJENb as the Mbol li- 
brary. Insert DNAs were 140 and 100 kb for GMJENa and 
GMJENb libraries, respectively. Each clone was stored in 
384-well microplates and kept at -80°C. 

End sequencing of BAC clones 

Both ends of all clones of GMJENa and 20,000 clones of 
GMJENb were sequenced by the BigDye Terminator (Life 
Technologies) method and ABI 3730x1 capillary sequencer 
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Table 1. Statistics of "Enrei" BAC-based physical map base on 20 chromosomes 
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BAC clones mapped on other scaffolds are not shown. 

BAC number: number of BAC clones mapped on each chromosome. 

BAC contig: number of contigs on each chromosome. 

Single BAC contig: number of contigs, consists of one BAC clone. 

Total length: base-pair of each chromosome. 

Covered length: size of BAC-covered regions. 

Total gap length: size of no BAC regions. 

Cover rate: (covered length)/(total length) x 100 (%). 



(Life Technologies) (Katagiri et al. 2004). The obtained 
sequence data were analyzed by PhredPhrap software 
(Ewing and Green 1998, Ewing et al. 1998). After exclusion 
of low-quality (Phred <30) bases, the average read- length of 
BAC-end sequences was 650 bases. 

Mapping of BAC clones and construction of physical 
map 

To identify the physical positions of each sequenced clone, 
end sequences were analyzed by Blastn with the Williams82 
genome assembly (Glymal.09). After sequencing, end- 
sequenced BAC clones were mapped on each chromosome 
of the Williams82 genome assembly. Finally, 59361 BAC 
clones (58997 clones were mapped on 20 chromosomes, 364 
clones were mapped on other scaffolds) were mapped on the 
Williams82 genome and 91% of the genome was covered by 
Enrei BAC clones (Table 1). We detected differences be- 
tween Enrei BAC-end sequences and the Williams82 ge- 
nome assembly. The mismatch rate was 0.2-0.5%, and the 
deletion rate was less than 0.1% for each chromosome. 



DaizuBase 

We constructed an integrated soybean genome database, 
DaizuBase (http://daizu.dna.affrc.go.jp). This database con- 
sists of Gbrowse, Unified map and blast search. The 
Gbrowse page shows BAC-based physical map, unified map 
page shows linkage map and DNA markers, both are based 
on Williams82 genome assembly. Gbrowse provides a 
tracking function for DNA sequence, BAC-end, BAC con- 
tigs, GC contents, ESTs, full-length cDNAs (Umezawa et al. 
2008), DNA markers (Fig. 1). And also, DaizuBase has a se- 
quence, keyword and position search systems. 

The prospects 

Using the Roche/454 next generation sequencer, GS-FLX 
Titanium (Margulies et al. 2005), 10 equivalent size of the 
genome of Japanese soybean cultivar, Enrei, has already 
been sequenced. After analyzing the data, we will upload ge- 
nome data for Enrei into DaizuBase. 

The database will provide SNPs and In/Dels data for 
Enrei and Williams82 genomes. 

Enrei genome data will be useful to distinguish domestic 
soybean genomes and isolate important genes. Furthermore, 
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HOME GBrowse UnifiedMap BLAST 
DAIZUbase 

The latest update on DAIZUbase (ver 1 .1 ) is now available. 

DAIZUbase is an integrated soybean genome database and data mining tool, consists of 2 map 
browsers, a gene viewer, and BLAST search system, more... 
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1 1 50 .6 kbp from Gm01 , positions 3S.7-14.B49 to 26,905,440 



Keyword Search 



Keyword Search 

- enter a keyword and choose from several parameters to delimit the search. 



Keyword: 



Position: 

chromosome (or scaffold} 



Data Set: 

B Enrei-Peking marker jWilliams82 marker riBAC_contig <jBAC_end lBAC 
tf Glymal annotation ;Glyma1 TE annotation jFLcDNA 
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Fig. 1. Browsing DaizuBase. A) DaizuBase top page with links to Gbrowse, Unified Map and Blast search. B) Gbrowse shows BAC -based phys- 
ical map data. C) Unified Map shows relationships among the linkage map, DNA markers and BAC end sequences. D) Sequence search systems 
using BLAST. 



sequencing of various Japanese cultivar genomes is prog- 
ressing using the next generation sequencer. These genomic 
data will be useful for establishing DNA markers for 
Japanese cultivars. 
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